Model Selection

Video Caption Generation

# Video Caption Generation

Tarsier-34b is an open-source large-scale video-language model focused on generating high-quality video captions and achieving leading results in multiple public benchmarks.

Timesformer Bert Video Captioning

A video caption generation model based on Timesformer and BERT architectures, capable of generating descriptive captions for video content.

Git Large Vatex

GIT is a Transformer decoder conditioned on CLIP image tokens and text tokens, designed for tasks like image and video caption generation and visual question answering.

Transformers Supports Multiple Languages

GIT is a Transformer-based generative image-to-text model, with the base version fine-tuned on the VATEX dataset, suitable for tasks such as image and video caption generation.

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase